Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 131250 |
| Missing cells | 22740 |
| Missing cells (%) | 1.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 21.1 MiB |
| Average record size in memory | 168.2 B |
Variable types
| Categorical | 5 |
|---|---|
| DateTime | 2 |
| Numeric | 8 |
| Boolean | 1 |
trip_distance is highly overall correlated with store_and_fwd_flag and 2 other fields | High correlation |
RatecodeID is highly overall correlated with tolls_amount | High correlation |
extra is highly overall correlated with Airport_fee | High correlation |
tolls_amount is highly overall correlated with RatecodeID | High correlation |
VendorID is highly overall correlated with improvement_surcharge | High correlation |
store_and_fwd_flag is highly overall correlated with trip_distance | High correlation |
improvement_surcharge is highly overall correlated with VendorID and 1 other fields | High correlation |
congestion_surcharge is highly overall correlated with trip_distance and 1 other fields | High correlation |
Airport_fee is highly overall correlated with trip_distance and 1 other fields | High correlation |
store_and_fwd_flag is highly imbalanced (94.0%) | Imbalance |
payment_type is highly imbalanced (55.8%) | Imbalance |
improvement_surcharge is highly imbalanced (95.5%) | Imbalance |
congestion_surcharge is highly imbalanced (69.2%) | Imbalance |
Airport_fee is highly imbalanced (70.8%) | Imbalance |
passenger_count has 4548 (3.5%) missing values | Missing |
RatecodeID has 4548 (3.5%) missing values | Missing |
store_and_fwd_flag has 4548 (3.5%) missing values | Missing |
congestion_surcharge has 4548 (3.5%) missing values | Missing |
Airport_fee has 4548 (3.5%) missing values | Missing |
trip_distance is highly skewed (γ1 = 262.2750407) | Skewed |
tip_amount has unique values | Unique |
passenger_count has 2160 (1.6%) zeros | Zeros |
trip_distance has 1990 (1.5%) zeros | Zeros |
extra has 38442 (29.3%) zeros | Zeros |
tolls_amount has 119444 (91.0%) zeros | Zeros |
Reproduction
| Analysis started | 2023-11-11 17:05:48.242166 |
|---|---|
| Analysis finished | 2023-11-11 17:06:12.076019 |
| Duration | 23.83 seconds |
| Software version | ydata-profiling vv4.6.0 |
| Download configuration | config.json |
VendorID
Categorical
HIGH CORRELATION 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.0 MiB |
| 1 | |
|---|---|
| 0 | |
| 2 | 49 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 131250 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 95498 | |
| 0 | 35703 | 27.2% |
| 2 | 49 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 95498 | |
| 0 | 35703 | 27.2% |
| 2 | 49 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 95498 | |
| 0 | 35703 | 27.2% |
| 2 | 49 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 131250 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 95498 | |
| 0 | 35703 | 27.2% |
| 2 | 49 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 131250 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 95498 | |
| 0 | 35703 | 27.2% |
| 2 | 49 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 131250 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 95498 | |
| 0 | 35703 | 27.2% |
| 2 | 49 | < 0.1% |
| Distinct | 91592 |
|---|---|
| Distinct (%) | 69.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.0 MiB |
| Minimum | 2023-06-28 15:28:01 |
|---|---|
| Maximum | 2023-07-01 00:58:11 |
| Distinct | 91377 |
|---|---|
| Distinct (%) | 69.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.0 MiB |
| Minimum | 2023-06-28 15:32:43 |
|---|---|
| Maximum | 2023-07-01 23:10:43 |
passenger_count
Real number (ℝ)
MISSING  ZEROS 
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 4548 |
| Missing (%) | 3.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.3580764 |
| Minimum | 0 |
|---|---|
| Maximum | 8 |
| Zeros | 2160 |
| Zeros (%) | 1.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.8924777 |
|---|---|
| Coefficient of variation (CV) | 0.65716309 |
| Kurtosis | 9.4602262 |
| Mean | 1.3580764 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.8667044 |
| Sum | 172071 |
| Variance | 0.79651645 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 96306 | |
| 2 | 18281 | 13.9% |
| 3 | 4513 | 3.4% |
| 4 | 2751 | 2.1% |
| 0 | 2160 | 1.6% |
| 5 | 1490 | 1.1% |
| 6 | 1199 | 0.9% |
| 8 | 2 | < 0.1% |
| (Missing) | 4548 | 3.5% |
| Value | Count | Frequency (%) |
| 0 | 2160 | 1.6% |
| 1 | 96306 | |
| 2 | 18281 | 13.9% |
| 3 | 4513 | 3.4% |
| 4 | 2751 | 2.1% |
| 5 | 1490 | 1.1% |
| 6 | 1199 | 0.9% |
| 8 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 8 | 2 | < 0.1% |
| 6 | 1199 | 0.9% |
| 5 | 1490 | 1.1% |
| 4 | 2751 | 2.1% |
| 3 | 4513 | 3.4% |
| 2 | 18281 | 13.9% |
| 1 | 96306 | |
| 0 | 2160 | 1.6% |
trip_distance
Real number (ℝ)
HIGH CORRELATION  SKEWED  ZEROS 
| Distinct | 2802 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.6461714 |
| Minimum | 0 |
|---|---|
| Maximum | 135182.06 |
| Zeros | 1990 |
| Zeros (%) | 1.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.0 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.48 |
| Q1 | 1.08 |
| median | 1.84 |
| Q3 | 3.63 |
| 95-th percentile | 16.67 |
| Maximum | 135182.06 |
| Range | 135182.06 |
| Interquartile range (IQR) | 2.55 |
Descriptive statistics
| Standard deviation | 456.06402 |
|---|---|
| Coefficient of variation (CV) | 80.774031 |
| Kurtosis | 71610.377 |
| Mean | 5.6461714 |
| Median Absolute Deviation (MAD) | 0.96 |
| Skewness | 262.27504 |
| Sum | 741059.99 |
| Variance | 207994.39 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1990 | 1.5% |
| 1 | 1824 | 1.4% |
| 1.2 | 1756 | 1.3% |
| 0.8 | 1738 | 1.3% |
| 1.1 | 1727 | 1.3% |
| 0.9 | 1725 | 1.3% |
| 0.7 | 1642 | 1.3% |
| 1.4 | 1579 | 1.2% |
| 1.3 | 1565 | 1.2% |
| 1.5 | 1518 | 1.2% |
| Other values (2792) | 114186 |
| Value | Count | Frequency (%) |
| 0 | 1990 | |
| 0.01 | 128 | 0.1% |
| 0.02 | 81 | 0.1% |
| 0.03 | 73 | 0.1% |
| 0.04 | 34 | < 0.1% |
| 0.05 | 47 | < 0.1% |
| 0.06 | 35 | < 0.1% |
| 0.07 | 34 | < 0.1% |
| 0.08 | 27 | < 0.1% |
| 0.09 | 18 | < 0.1% |
| Value | Count | Frequency (%) |
| 135182.06 | 1 | |
| 92292.43 | 1 | |
| 20314 | 1 | |
| 9673.69 | 1 | |
| 143.35 | 1 | |
| 104.09 | 1 | |
| 84.16 | 1 | |
| 83.69 | 1 | |
| 79.55 | 1 | |
| 73.23 | 1 |
RatecodeID
Real number (ℝ)
HIGH CORRELATION  MISSING 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 4548 |
| Missing (%) | 3.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.5172847 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.0 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 6.497366 |
|---|---|
| Coefficient of variation (CV) | 4.2822327 |
| Kurtosis | 220.17059 |
| Mean | 1.5172847 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 14.874864 |
| Sum | 192243 |
| Variance | 42.215765 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 118931 | |
| 2 | 5517 | 4.2% |
| 5 | 805 | 0.6% |
| 99 | 558 | 0.4% |
| 3 | 553 | 0.4% |
| 4 | 338 | 0.3% |
| (Missing) | 4548 | 3.5% |
| Value | Count | Frequency (%) |
| 1 | 118931 | |
| 2 | 5517 | 4.2% |
| 3 | 553 | 0.4% |
| 4 | 338 | 0.3% |
| 5 | 805 | 0.6% |
| 99 | 558 | 0.4% |
| Value | Count | Frequency (%) |
| 99 | 558 | 0.4% |
| 5 | 805 | 0.6% |
| 4 | 338 | 0.3% |
| 3 | 553 | 0.4% |
| 2 | 5517 | 4.2% |
| 1 | 118931 |
store_and_fwd_flag
Boolean
HIGH CORRELATION  IMBALANCE  MISSING 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 4548 |
| Missing (%) | 3.5% |
| Memory size | 5.3 MiB |
| False | |
|---|---|
| True | 883 |
| (Missing) | 4548 |
| Value | Count | Frequency (%) |
| False | 125819 | |
| True | 883 | 0.7% |
| (Missing) | 4548 | 3.5% |
PULocationID
Real number (ℝ)
| Distinct | 264 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 132.86859 |
| Minimum | 1 |
|---|---|
| Maximum | 264 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.0 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 67 |
| median | 133 |
| Q3 | 199 |
| 95-th percentile | 251 |
| Maximum | 264 |
| Range | 263 |
| Interquartile range (IQR) | 132 |
Descriptive statistics
| Standard deviation | 76.201612 |
|---|---|
| Coefficient of variation (CV) | 0.57351109 |
| Kurtosis | -1.1970131 |
| Mean | 132.86859 |
| Median Absolute Deviation (MAD) | 66 |
| Skewness | -0.0077132294 |
| Sum | 17439003 |
| Variance | 5806.6857 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8 | 558 | 0.4% |
| 117 | 549 | 0.4% |
| 247 | 548 | 0.4% |
| 146 | 548 | 0.4% |
| 93 | 545 | 0.4% |
| 171 | 545 | 0.4% |
| 43 | 544 | 0.4% |
| 101 | 542 | 0.4% |
| 162 | 540 | 0.4% |
| 264 | 540 | 0.4% |
| Other values (254) | 125791 |
| Value | Count | Frequency (%) |
| 1 | 500 | |
| 2 | 483 | |
| 3 | 513 | |
| 4 | 491 | |
| 5 | 509 | |
| 6 | 491 | |
| 7 | 493 | |
| 8 | 558 | |
| 9 | 502 | |
| 10 | 473 |
| Value | Count | Frequency (%) |
| 264 | 540 | |
| 263 | 483 | |
| 262 | 488 | |
| 261 | 494 | |
| 260 | 514 | |
| 259 | 499 | |
| 258 | 506 | |
| 257 | 502 | |
| 256 | 531 | |
| 255 | 493 |
DOLocationID
Real number (ℝ)
| Distinct | 264 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 132.8411 |
| Minimum | 1 |
|---|---|
| Maximum | 264 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.0 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 67 |
| median | 133 |
| Q3 | 199 |
| 95-th percentile | 252 |
| Maximum | 264 |
| Range | 263 |
| Interquartile range (IQR) | 132 |
Descriptive statistics
| Standard deviation | 76.16586 |
|---|---|
| Coefficient of variation (CV) | 0.57336066 |
| Kurtosis | -1.1969142 |
| Mean | 132.8411 |
| Median Absolute Deviation (MAD) | 66 |
| Skewness | -0.0037655891 |
| Sum | 17435394 |
| Variance | 5801.2382 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 263 | 575 | 0.4% |
| 127 | 571 | 0.4% |
| 167 | 548 | 0.4% |
| 98 | 546 | 0.4% |
| 247 | 543 | 0.4% |
| 14 | 543 | 0.4% |
| 226 | 542 | 0.4% |
| 115 | 541 | 0.4% |
| 153 | 540 | 0.4% |
| 231 | 539 | 0.4% |
| Other values (254) | 125762 |
| Value | Count | Frequency (%) |
| 1 | 476 | |
| 2 | 502 | |
| 3 | 456 | |
| 4 | 515 | |
| 5 | 496 | |
| 6 | 477 | |
| 7 | 510 | |
| 8 | 476 | |
| 9 | 463 | |
| 10 | 478 |
| Value | Count | Frequency (%) |
| 264 | 522 | |
| 263 | 575 | |
| 262 | 487 | |
| 261 | 514 | |
| 260 | 483 | |
| 259 | 509 | |
| 258 | 460 | |
| 257 | 503 | |
| 256 | 512 | |
| 255 | 481 |
payment_type
Categorical
IMBALANCE 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.0 MiB |
| Credit Card | |
|---|---|
| Cash | |
| Wallet | 4548 |
| unknown | 1766 |
| UPI | 916 |
Length
| Max length | 11 |
|---|---|
| Median length | 11 |
| Mean length | 9.5125029 |
| Min length | 3 |
Characters and Unicode
| Total characters | 1248516 |
|---|---|
| Distinct characters | 20 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Credit Card |
|---|---|
| 2nd row | Credit Card |
| 3rd row | Credit Card |
| 4th row | Credit Card |
| 5th row | Credit Card |
Common Values
| Value | Count | Frequency (%) |
| Credit Card | 101434 | |
| Cash | 22586 | 17.2% |
| Wallet | 4548 | 3.5% |
| unknown | 1766 | 1.3% |
| UPI | 916 | 0.7% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| credit | 101434 | |
| card | 101434 | |
| cash | 22586 | 9.7% |
| wallet | 4548 | 2.0% |
| unknown | 1766 | 0.8% |
| upi | 916 | 0.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 225454 | |
| d | 202868 | |
| r | 202868 | |
| a | 128568 | |
| e | 105982 | |
| t | 105982 | |
| i | 101434 | |
| 101434 | ||
| s | 22586 | 1.8% |
| h | 22586 | 1.8% |
| Other values (10) | 28754 | 2.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 914332 | |
| Uppercase Letter | 232750 | 18.6% |
| Space Separator | 101434 | 8.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| d | 202868 | |
| r | 202868 | |
| a | 128568 | |
| e | 105982 | |
| t | 105982 | |
| i | 101434 | |
| s | 22586 | 2.5% |
| h | 22586 | 2.5% |
| l | 9096 | 1.0% |
| n | 5298 | 0.6% |
| Other values (4) | 7064 | 0.8% |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 225454 | |
| W | 4548 | 2.0% |
| U | 916 | 0.4% |
| P | 916 | 0.4% |
| I | 916 | 0.4% |
Space Separator
| Value | Count | Frequency (%) |
| 101434 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1147082 | |
| Common | 101434 | 8.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| C | 225454 | |
| d | 202868 | |
| r | 202868 | |
| a | 128568 | |
| e | 105982 | |
| t | 105982 | |
| i | 101434 | |
| s | 22586 | 2.0% |
| h | 22586 | 2.0% |
| l | 9096 | 0.8% |
| Other values (9) | 19658 | 1.7% |
Common
| Value | Count | Frequency (%) |
| 101434 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1248516 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| C | 225454 | |
| d | 202868 | |
| r | 202868 | |
| a | 128568 | |
| e | 105982 | |
| t | 105982 | |
| i | 101434 | |
| 101434 | ||
| s | 22586 | 1.8% |
| h | 22586 | 1.8% |
| Other values (10) | 28754 | 2.3% |
extra
Real number (ℝ)
HIGH CORRELATION  ZEROS 
| Distinct | 28 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.9341551 |
| Minimum | -7.5 |
|---|---|
| Maximum | 11.75 |
| Zeros | 38442 |
| Zeros (%) | 29.3% |
| Negative | 843 |
| Negative (%) | 0.6% |
| Memory size | 6.0 MiB |
Quantile statistics
| Minimum | -7.5 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2.5 |
| 95-th percentile | 5 |
| Maximum | 11.75 |
| Range | 19.25 |
| Interquartile range (IQR) | 2.5 |
Descriptive statistics
| Standard deviation | 1.9519249 |
|---|---|
| Coefficient of variation (CV) | 1.0091873 |
| Kurtosis | 1.9513187 |
| Mean | 1.9341551 |
| Median Absolute Deviation (MAD) | 1.5 |
| Skewness | 1.0989403 |
| Sum | 253857.86 |
| Variance | 3.8100107 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 38442 | |
| 2.5 | 37622 | |
| 1 | 26666 | |
| 5 | 13002 | 9.9% |
| 3.5 | 8799 | 6.7% |
| 7.5 | 1771 | 1.3% |
| 6 | 1332 | 1.0% |
| 4.25 | 679 | 0.5% |
| 9.25 | 617 | 0.5% |
| -1 | 395 | 0.3% |
| Other values (18) | 1925 | 1.5% |
| Value | Count | Frequency (%) |
| -7.5 | 14 | < 0.1% |
| -6 | 15 | < 0.1% |
| -5 | 57 | < 0.1% |
| -2.5 | 362 | 0.3% |
| -1 | 395 | 0.3% |
| 0 | 38442 | |
| 0.11 | 1 | < 0.1% |
| 0.25 | 1 | < 0.1% |
| 0.75 | 2 | < 0.1% |
| 1 | 26666 |
| Value | Count | Frequency (%) |
| 11.75 | 196 | 0.1% |
| 10.25 | 175 | 0.1% |
| 10 | 76 | 0.1% |
| 9.25 | 617 | 0.5% |
| 8.5 | 42 | < 0.1% |
| 7.75 | 155 | 0.1% |
| 7.5 | 1771 | |
| 6.75 | 210 | 0.2% |
| 6 | 1332 | |
| 5.25 | 7 | < 0.1% |
tip_amount
Real number (ℝ)
UNIQUE 
| Distinct | 131250 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.1329069 |
| Minimum | 0.00012939597 |
|---|---|
| Maximum | 484.87615 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.0 MiB |
Quantile statistics
| Minimum | 0.00012939597 |
|---|---|
| 5-th percentile | 1.025461 |
| Q1 | 3.4761096 |
| median | 5.2979448 |
| Q3 | 7.5146978 |
| 95-th percentile | 15.118223 |
| Maximum | 484.87615 |
| Range | 484.87602 |
| Interquartile range (IQR) | 4.0385882 |
Descriptive statistics
| Standard deviation | 4.6573437 |
|---|---|
| Coefficient of variation (CV) | 0.75940231 |
| Kurtosis | 893.83261 |
| Mean | 6.1329069 |
| Median Absolute Deviation (MAD) | 1.9947844 |
| Skewness | 11.313853 |
| Sum | 804944.04 |
| Variance | 21.69085 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4.776779595 | 1 | < 0.1% |
| 6.008778287 | 1 | < 0.1% |
| 3.149878845 | 1 | < 0.1% |
| 18.87838917 | 1 | < 0.1% |
| 1.105655264 | 1 | < 0.1% |
| 4.557259746 | 1 | < 0.1% |
| 5.127701094 | 1 | < 0.1% |
| 1.290491412 | 1 | < 0.1% |
| 6.943745452 | 1 | < 0.1% |
| 6.652131774 | 1 | < 0.1% |
| Other values (131240) | 131240 |
| Value | Count | Frequency (%) |
| 0.0001293959739 | 1 | |
| 0.0002753208 | 1 | |
| 0.0004350792141 | 1 | |
| 0.0004601416502 | 1 | |
| 0.0007132443445 | 1 | |
| 0.0008942244453 | 1 | |
| 0.0009188518326 | 1 | |
| 0.001051727527 | 1 | |
| 0.001095234922 | 1 | |
| 0.00109933439 | 1 |
| Value | Count | Frequency (%) |
| 484.8761506 | 1 | |
| 184.3134585 | 1 | |
| 170.761373 | 1 | |
| 110.9277285 | 1 | |
| 101.3168704 | 1 | |
| 91.16362159 | 1 | |
| 84.47570597 | 1 | |
| 84.03261678 | 1 | |
| 80.82858772 | 1 | |
| 76.40339182 | 1 |
tolls_amount
Real number (ℝ)
HIGH CORRELATION  ZEROS 
| Distinct | 184 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6505181 |
| Minimum | -29.3 |
|---|---|
| Maximum | 80 |
| Zeros | 119444 |
| Zeros (%) | 91.0% |
| Negative | 92 |
| Negative (%) | 0.1% |
| Memory size | 6.0 MiB |
Quantile statistics
| Minimum | -29.3 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 6.55 |
| Maximum | 80 |
| Range | 109.3 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.3364782 |
|---|---|
| Coefficient of variation (CV) | 3.591719 |
| Kurtosis | 54.933921 |
| Mean | 0.6505181 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.2022792 |
| Sum | 85380.5 |
| Variance | 5.4591303 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 119444 | |
| 6.55 | 10548 | 8.0% |
| 12.75 | 185 | 0.1% |
| 14.75 | 169 | 0.1% |
| 3 | 99 | 0.1% |
| -6.55 | 78 | 0.1% |
| 19.3 | 66 | 0.1% |
| 13.1 | 64 | < 0.1% |
| 21.3 | 41 | < 0.1% |
| 2.45 | 36 | < 0.1% |
| Other values (174) | 520 | 0.4% |
| Value | Count | Frequency (%) |
| -29.3 | 1 | |
| -21.3 | 1 | |
| -14.75 | 1 | |
| -12.75 | 1 | |
| -12.55 | 1 | |
| -10.5 | 1 | |
| -10 | 2 | |
| -8.55 | 1 | |
| -8.5 | 1 | |
| -8.3 | 2 |
| Value | Count | Frequency (%) |
| 80 | 1 | |
| 76 | 1 | |
| 63 | 1 | |
| 53 | 1 | |
| 47.25 | 1 | |
| 45.15 | 1 | |
| 42.55 | 1 | |
| 40 | 1 | |
| 38.25 | 1 | |
| 36.05 | 1 |
improvement_surcharge
Categorical
HIGH CORRELATION  IMBALANCE 
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.0 MiB |
| 1.0 | |
|---|---|
| -1.0 | 1314 |
| 0.3 | 64 |
| 0.0 | 34 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0100114 |
| Min length | 3 |
Characters and Unicode
| Total characters | 395064 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 3 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 129838 | |
| -1.0 | 1314 | 1.0% |
| 0.3 | 64 | < 0.1% |
| 0.0 | 34 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1.0 | 131152 | |
| 0.3 | 64 | < 0.1% |
| 0.0 | 34 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 131284 | |
| . | 131250 | |
| 1 | 131152 | |
| - | 1314 | 0.3% |
| 3 | 64 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 262500 | |
| Other Punctuation | 131250 | |
| Dash Punctuation | 1314 | 0.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 131284 | |
| 1 | 131152 | |
| 3 | 64 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 131250 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1314 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 395064 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 131284 | |
| . | 131250 | |
| 1 | 131152 | |
| - | 1314 | 0.3% |
| 3 | 64 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 395064 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 131284 | |
| . | 131250 | |
| 1 | 131152 | |
| - | 1314 | 0.3% |
| 3 | 64 | < 0.1% |
congestion_surcharge
Categorical
HIGH CORRELATION  IMBALANCE  MISSING 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 4548 |
| Missing (%) | 3.5% |
| Memory size | 6.0 MiB |
| 2.5 | |
|---|---|
| 0.0 | 10806 |
| -2.5 | 1053 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0083108 |
| Min length | 3 |
Characters and Unicode
| Total characters | 381159 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 3 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2.5 |
|---|---|
| 2nd row | 2.5 |
| 3rd row | 0.0 |
| 4th row | 2.5 |
| 5th row | 2.5 |
Common Values
| Value | Count | Frequency (%) |
| 2.5 | 114843 | |
| 0.0 | 10806 | 8.2% |
| -2.5 | 1053 | 0.8% |
| (Missing) | 4548 | 3.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2.5 | 115896 | |
| 0.0 | 10806 | 8.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 126702 | |
| 2 | 115896 | |
| 5 | 115896 | |
| 0 | 21612 | 5.7% |
| - | 1053 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 253404 | |
| Other Punctuation | 126702 | |
| Dash Punctuation | 1053 | 0.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 115896 | |
| 5 | 115896 | |
| 0 | 21612 | 8.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 126702 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1053 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 381159 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 126702 | |
| 2 | 115896 | |
| 5 | 115896 | |
| 0 | 21612 | 5.7% |
| - | 1053 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 381159 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 126702 | |
| 2 | 115896 | |
| 5 | 115896 | |
| 0 | 21612 | 5.7% |
| - | 1053 | 0.3% |
Airport_fee
Categorical
HIGH CORRELATION  IMBALANCE  MISSING 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 4548 |
| Missing (%) | 3.5% |
| Memory size | 6.0 MiB |
| 0.0 | |
|---|---|
| 1.75 | |
| -1.75 | 192 |
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 3.0962258 |
| Min length | 3 |
Characters and Unicode
| Total characters | 392298 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 3 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 114702 | |
| 1.75 | 11808 | 9.0% |
| -1.75 | 192 | 0.1% |
| (Missing) | 4548 | 3.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.0 | 114702 | |
| 1.75 | 12000 | 9.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 229404 | |
| . | 126702 | |
| 1 | 12000 | 3.1% |
| 7 | 12000 | 3.1% |
| 5 | 12000 | 3.1% |
| - | 192 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 265404 | |
| Other Punctuation | 126702 | |
| Dash Punctuation | 192 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 229404 | |
| 1 | 12000 | 4.5% |
| 7 | 12000 | 4.5% |
| 5 | 12000 | 4.5% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 126702 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 192 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 392298 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 229404 | |
| . | 126702 | |
| 1 | 12000 | 3.1% |
| 7 | 12000 | 3.1% |
| 5 | 12000 | 3.1% |
| - | 192 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 392298 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 229404 | |
| . | 126702 | |
| 1 | 12000 | 3.1% |
| 7 | 12000 | 3.1% |
| 5 | 12000 | 3.1% |
| - | 192 | < 0.1% |
| passenger_count | trip_distance | RatecodeID | PULocationID | DOLocationID | extra | tip_amount | tolls_amount | VendorID | store_and_fwd_flag | payment_type | improvement_surcharge | congestion_surcharge | Airport_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| passenger_count | 1.000 | 0.064 | 0.079 | 0.001 | 0.000 | -0.029 | 0.009 | 0.065 | 0.228 | 0.058 | 0.039 | 0.014 | 0.017 | 0.050 |
| trip_distance | 0.064 | 1.000 | 0.292 | -0.002 | -0.005 | 0.083 | 0.382 | 0.442 | 0.000 | 1.000 | 0.013 | 0.000 | 1.000 | 1.000 |
| RatecodeID | 0.079 | 0.292 | 1.000 | 0.005 | -0.002 | -0.084 | 0.138 | 0.525 | 0.108 | 0.002 | 0.033 | 0.023 | 0.218 | 0.021 |
| PULocationID | 0.001 | -0.002 | 0.005 | 1.000 | 0.001 | -0.000 | -0.001 | 0.002 | 0.000 | 0.003 | 0.000 | 0.000 | 0.000 | 0.003 |
| DOLocationID | 0.000 | -0.005 | -0.002 | 0.001 | 1.000 | 0.002 | -0.000 | -0.002 | 0.000 | 0.000 | 0.000 | 0.004 | 0.002 | 0.000 |
| extra | -0.029 | 0.083 | -0.084 | -0.000 | 0.002 | 1.000 | 0.101 | 0.139 | 0.412 | 0.075 | 0.205 | 0.342 | 0.396 | 0.516 |
| tip_amount | 0.009 | 0.382 | 0.138 | -0.001 | -0.000 | 0.101 | 1.000 | 0.256 | 0.000 | 0.000 | 0.000 | 0.000 | 0.033 | 0.017 |
| tolls_amount | 0.065 | 0.442 | 0.525 | 0.002 | -0.002 | 0.139 | 0.256 | 1.000 | 0.023 | 0.015 | 0.042 | 0.057 | 0.156 | 0.382 |
| VendorID | 0.228 | 0.000 | 0.108 | 0.000 | 0.000 | 0.412 | 0.000 | 0.023 | 1.000 | 0.125 | 0.088 | 0.620 | 0.057 | 0.045 |
| store_and_fwd_flag | 0.058 | 1.000 | 0.002 | 0.003 | 0.000 | 0.075 | 0.000 | 0.015 | 0.125 | 1.000 | 0.025 | 0.031 | 0.009 | 0.002 |
| payment_type | 0.039 | 0.013 | 0.033 | 0.000 | 0.000 | 0.205 | 0.000 | 0.042 | 0.088 | 0.025 | 1.000 | 0.322 | 0.364 | 0.145 |
| improvement_surcharge | 0.014 | 0.000 | 0.023 | 0.000 | 0.004 | 0.342 | 0.000 | 0.057 | 0.620 | 0.031 | 0.322 | 1.000 | 0.635 | 0.270 |
| congestion_surcharge | 0.017 | 1.000 | 0.218 | 0.000 | 0.002 | 0.396 | 0.033 | 0.156 | 0.057 | 0.009 | 0.364 | 0.635 | 1.000 | 0.329 |
| Airport_fee | 0.050 | 1.000 | 0.021 | 0.003 | 0.000 | 0.516 | 0.017 | 0.382 | 0.045 | 0.002 | 0.145 | 0.270 | 0.329 | 1.000 |
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | extra | tip_amount | tolls_amount | improvement_surcharge | congestion_surcharge | Airport_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 94366 | 0 | 2023-06-29 18:18:35 | 2023-06-29 19:27:45 | 1.0 | 1.30 | 1.0 | N | 33 | 137 | Credit Card | 5.0 | 4.776780 | 0.00 | 1.0 | 2.5 | 0.0 |
| 11777 | 0 | 2023-06-30 12:18:42 | 2023-06-30 12:33:14 | 1.0 | 10.80 | 1.0 | N | 73 | 114 | Credit Card | 7.5 | 8.476391 | 6.55 | 1.0 | 2.5 | 0.0 |
| 33330 | 1 | 2023-06-29 10:30:27 | 2023-06-29 10:42:52 | 0.0 | 0.00 | 5.0 | N | 238 | 45 | Credit Card | 0.0 | 1.192649 | 6.55 | 1.0 | 0.0 | 0.0 |
| 158253 | 1 | 2023-06-28 18:10:27 | 2023-06-28 17:23:27 | 1.0 | 3.65 | 1.0 | N | 3 | 89 | Credit Card | 2.5 | 8.272961 | 0.00 | 1.0 | 2.5 | 0.0 |
| 114020 | 1 | 2023-06-29 07:53:01 | 2023-06-29 08:46:26 | 1.0 | 3.43 | 1.0 | N | 141 | 233 | Credit Card | 0.0 | 7.102520 | 0.00 | 1.0 | 2.5 | 0.0 |
| 169335 | 1 | 2023-06-29 13:13:51 | 2023-06-29 14:25:39 | 2.0 | 1.58 | 1.0 | N | 5 | 170 | Credit Card | 0.0 | 3.339151 | 0.00 | 1.0 | 2.5 | 0.0 |
| 9720 | 1 | 2023-06-29 23:02:27 | 2023-06-29 22:34:29 | 2.0 | 1.22 | 1.0 | N | 171 | 167 | Credit Card | 1.0 | 4.013205 | 0.00 | 1.0 | 2.5 | 0.0 |
| 8626 | 0 | 2023-06-30 13:31:11 | 2023-06-30 13:32:41 | 1.0 | 1.90 | 1.0 | N | 144 | 46 | Credit Card | 2.5 | 6.681873 | 0.00 | 1.0 | 2.5 | 0.0 |
| 35712 | 1 | 2023-06-30 00:31:39 | 2023-06-30 01:48:49 | 1.0 | 4.59 | 1.0 | N | 35 | 146 | Credit Card | 1.0 | 9.839112 | 0.00 | 1.0 | 2.5 | 0.0 |
| 96572 | 1 | 2023-06-30 17:24:46 | 2023-06-30 16:42:25 | 1.0 | 1.72 | 1.0 | N | 84 | 178 | Cash | 2.5 | 2.740825 | 0.00 | 1.0 | 2.5 | 0.0 |
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | extra | tip_amount | tolls_amount | improvement_surcharge | congestion_surcharge | Airport_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9909 | 0 | 2023-06-30 06:37:07 | 2023-06-30 06:18:39 | 0.0 | 1.60 | 1.0 | N | 32 | 34 | Credit Card | 2.50 | 1.762231 | 0.00 | 1.0 | 2.5 | 0.00 |
| 21453 | 0 | 2023-06-29 07:29:23 | 2023-06-29 06:36:36 | 1.0 | 1.40 | 1.0 | N | 121 | 225 | Cash | 2.50 | 2.806821 | 0.00 | 1.0 | 2.5 | 0.00 |
| 150810 | 1 | 2023-06-28 17:37:38 | 2023-06-28 19:07:47 | 1.0 | 9.88 | 1.0 | N | 102 | 149 | Credit Card | 7.50 | 5.991983 | 6.55 | 1.0 | 2.5 | 1.75 |
| 101424 | 1 | 2023-06-30 18:17:02 | 2023-06-30 18:13:41 | 1.0 | 1.64 | 1.0 | N | 219 | 73 | Credit Card | 2.50 | 4.221079 | 0.00 | 1.0 | 2.5 | 0.00 |
| 152252 | 1 | 2023-06-29 20:20:16 | 2023-06-29 21:41:02 | 1.0 | 0.97 | 1.0 | N | 12 | 68 | Credit Card | 1.00 | 3.513716 | 0.00 | 1.0 | 2.5 | 0.00 |
| 8852 | 0 | 2023-06-29 13:57:29 | 2023-06-29 13:27:25 | 1.0 | 0.30 | 1.0 | N | 176 | 167 | Cash | 2.50 | 3.780324 | 0.00 | 1.0 | 2.5 | 0.00 |
| 69752 | 1 | 2023-06-29 09:05:51 | 2023-06-29 09:30:17 | 2.0 | 6.81 | 1.0 | N | 68 | 11 | Credit Card | 0.00 | 0.695260 | 0.00 | 1.0 | 2.5 | 0.00 |
| 152827 | 1 | 2023-06-30 15:40:44 | 2023-06-30 15:06:36 | 1.0 | 1.37 | 1.0 | N | 186 | 198 | Credit Card | 0.00 | 3.963555 | 0.00 | 1.0 | 2.5 | 0.00 |
| 80321 | 0 | 2023-06-28 18:22:08 | 2023-06-28 19:48:18 | 0.0 | 9.30 | 1.0 | N | 164 | 118 | Credit Card | 11.75 | 13.466104 | 6.55 | 1.0 | 2.5 | 1.75 |
| 90215 | 1 | 2023-06-29 17:24:18 | 2023-06-29 19:16:29 | NaN | 1.76 | NaN | NaN | 218 | 202 | Wallet | 0.00 | 4.901728 | 0.00 | 1.0 | NaN | NaN |